Search CORE

153 research outputs found

Efficient Elastic Net Regularization for Sparse Linear Models

Author: Elkan Charles
Lipton Zachary C.
Publication venue
Publication date: 02/07/2015
Field of study

This paper presents an algorithm for efficient training of sparse linear models with elastic net regularization. Extending previous work on delayed updates, the new algorithm applies stochastic gradient updates to non-zero features only, bringing weights current as needed with closed-form updates. Closed-form delayed updates for the

\ell_1

\ell_{\infty}

, and rarely used

\ell_2

regularizers have been described previously. This paper provides closed-form updates for the popular squared norm

\ell^2_2

and elastic net regularizers. We provide dynamic programming algorithms that perform each delayed update in constant time. The new

\ell^2_2

and elastic net methods handle both fixed and varying learning rates, and both standard {stochastic gradient descent} (SGD) and {forward backward splitting (FoBoS)}. Experimental results show that on a bag-of-words dataset with

260,941

features, but only

88

nonzero features on average per training example, the dynamic programming method trains a logistic regression classifier with elastic net regularization over

2000

times faster than otherwise

arXiv.org e-Print Archive

CiteSeerX

Modeling Word Burstiness Using the Dirichlet Distribution

Author: Elkan Charles
Kauchak David
Madsen Rasmus Elsborg
Publication venue
Publication date: 01/01/2005
Field of study

Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word appears once, it is more likely to appear again. In this paper, we propose the Dirichlet compound multinomial model (DCM) as an alternative to the multinomial. The DCM model has one additional degree of freedom, which allows it to capture burstiness. We show experimentally that the DCM is substantially better than the multinomial at modeling text data, measured by perplexity. We also show using three standard document collections that the DCM leads to better classification than the multinomial model. DCM performance is comparable to that obtained with multiple heuristic changes to the multinomial model. 1

CiteSeerX

Online Research Database In Technology

A Distributed Solution to the PTE Problem

Author: Borrajo Millán Daniel
Elkan Charles
Giráldez J. Ignacio
Publication venue: American Association for Artificial Intelligence (AAAI)
Publication date: 01/01/1999
Field of study

Proceeding of: AAAI Spring Symposium on Predictive Toxicology, AAAI Press, Stanford, March 1999A wide panoply of machine learning methods is available for application to the Predictive Toxicology Evaluation (PTE) problem. The authors have built four monolithic classification systems based on Tilde, Progol, C4.5 and naive bayesian classification. These systems have been trained using the PTE dataset, and their accuracy has been tested using the unseen PTE1 data set as test set. A Multi Agent Decision System (MADES) has been built using the aforementioned monolithic systems to build classification agents. The MADES was trained and tested with the same data sets used with the monolithic systems. Results show that the accuracy of the MADES improves the accuracies obtained by the monolithic systems. We believe that in most real world domains the combination of several approaches is stronger than the individuals. Introduction The Predictive Toxicology Evaluation (PTE) Challenge (Srinivasan et al. 1997) was devised by the Oxford University Computing Laboratory to test the suitability ...Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo